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It is proved that nonparametric autoregression is asymptotically 
equivalent in the sense of Le Cam's deficiency distance to nonpara- 
metric regression with random design as well as with regular nonran- 
dom design. 

1. Introduction. We assume that observations Xq, . . . ,Xn from a sta- 
tionary autoregressive process (Xj)j=o,...,n are available which obey the model 
equation 

(1) Xi = f{Xi_i) +ei, i = l,...,n, 

where (ei)i=i,...,n are i.i.d. random variables. The unknown autoregression 
function / is then the target of statistical inference and the development of 
efficient estimators is a natural task for theoretically oriented statisticians. 
On the one hand, it has been recognized for a long time that commonly 
used estimators in model (1) have the same asymptotic behavior as corre- 
sponding estimators in nonparametric regression. A result of Robinson [26] 
concerns the pointwise equivalence of nonparametric kernel estimators and 
Neumann and Kreiss [22] extended this equivalence to the global behavior 
of nonparametric estimators. On the other hand, despite these well-known 
similarities between estimators, there is still a certain discrepancy in the 
current state of available theory in both contexts. While there is a very 
well developed asymptotic theory for optimal estimation in nonparametric 
regression, even up to the level of exact asymptotics (see, e.g., [13] or [24], 
for an overview), there is considerably less theory available in the case of 
nonparametric autoregression. 
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The purpose of the present paper is to bridge this gap between the two 
settings of nonparametric regression and autoregression by showing asymp- 
totic equivalence on an abstract level. The theory of asymptotic equivalence 
of statistical experiments has been developed in Le Cam's [19] work. In the 
framework of nonparametric statistics, Brown and Low [4] proved that the 
Gaussian white noise experiment and nonparametric regression with non- 
random design and Gaussian errors are asymptotically equivalent in the 
sense that Le Cam's deficiency distance between them tends to zero. In 
[12, 14, 15, 23] the scope of asymptotic equivalence was extended to the 
nonparametric density estimation problem and to nonparametrically driven 
regression models. Moreover, asymptotic equivalence of nonparametric re- 
gression with random design and Gaussian white noise was shown in [2] 
while asymptotic equivalence of Poisson processes and Gaussian white noise 
was established in [3]. The issue of constructive asymptotic equivalence is 
considered in [25] and [5]. The asymptotic equivalence of a close relative of 
nonparametric autoregression, a diffusion experiment parametrized by the 
drift function, to Gaussian white noise experiments is proved in [8] and [7]. 
Milstein and Nussbaum [21] showed asymptotic equivalence of a nonpara- 
metric statistical model of small diffusion type and its discretization by a 
stochastic Euler difference scheme. These models deal with dependent ob- 
servations in continuous time. However, asymptotic equivalence for models 
with dependent observations in discrete time where the noise is non-Gaussian 
seems to be a much more difficult issue. 

In this paper we establish local equivalence of nonparametric autoregres- 
sion (1) and nonparametric regression in the discrete-time setting. That is, 
the set of possible functions lies in a class S„(/o) centered around some fixed 
function /o and shrinking in some appropriate norm as oo. Depending 
on additional prior smoothness assumptions on /, this class will nevertheless 
be rich enough for the transfer of minimax lower bounds from one to the 
other model. Under mild regularity assumptions stated below, the process 
(-'^j)j=o,...,n corresponding to /o has a stationary density tpf^, say. We show 
asymptotic equivalence of the experiment given by (1) to nonparametric 
regression with random design as well as with regular nonrandom design. 
The former experiment corresponds to i.i.d. observations {Yi,(^i), . . . , (Yn, £,n) 
with 

(2) Yi = f{^i)+i]„ i = l,...,n, 

where E{r]i\^i) = 0. The basic assumption on the errors r]i is that their Fisher 
information is the same as that of the e^'s. This includes the case of Gaus- 
sian errors as well as of errors having the same distribution as the e^. The 
are distributed according to the stationary density V/o of the process 
corresponding to the central function /o, regardless of the actual value of /. 
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We show also equivalence to nonparametric regression with regular non- 
random design, which corresponds to independent observations Yn^i, . . . , Yn^n 
obeying the model 

(3) Yn,i = f{tnA) + r]i, i = l,...,n, 

where Erji = 0. Here we will assume that the design points are regularly 

spaced with density ^j^, that is, /JJ^ V/o(^) = ~ l/^)/^- We assume 
again that the Fisher information of r]i is the same as the Fisher information 
of Si. Since Le Cam's equivalence relation is transitive, we also obtain as an 
immediate by-product asymptotic equivalence of nonparametric regression 
with random and regular nonrandom design. In the special case of Gaussian 
errors but under weaker smoothness assumptions on /, this equivalence also 
follows from the asymptotic equivalence of nonparametric regression with 
nonrandom design and Gaussian white noise [4] and the asymptotic equiv- 
alence of nonparametric regression with random design and Gaussian white 
noise [2]. 

At the end of Section 2 we discuss briefly how our results on asymptotic 
equivalence can be used to transfer well-known lower asymptotic bounds for 
the minimax risk in nonparametric regression to the case of nonparametric 
autoregression. Our local version of asymptotic equivalence does not allow 
an immediate transfer of upper asymptotic bounds; however, they could be 
independently proved by appeal to strong approximations of nonparametric 
estimators in both models (see [22] for details) or by direct computation of 
the risk of asymptotically optimal estimators. 



2. Assumptions and main results. We start by introducing an appropri- 
ate functional parameter set. Consider the set of functions 



.F=(/:M^M:sup|/(x)|<m1, 



where M < oo is a constant. For any constants /? > and L > 0, let TC = 
TC{P,L) be a Holder ball, that is, the set of functions / :M ^ M satisfying 

I/I < L, \fW (^) _ fW (y)| < L\x - y|^-L/3J , x,yeR. 

Here [/3J denotes the largest integer strictly less than f3. The set of functional 
parameters is defined as 

Let Xq be a random variable on the probability space {Tl,A,P). Assume 
that we observe a sequence Xi,. . . , Xn which obeys 

(4) Xi = f{Xi_i)+ei, i = l,...,n. 
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where ei,...,en are i.i.d. with a given density p that is continuous and 
positive on M and the function / G S is assumed to be unknown. It is easy 
to see that, for any / € S, G B\Xi = x) > fi{B) holds for all B £ B 

and X £ M, where fi is some measure not depending on / with ^(M) = fio> 0. 
From Theorem 2.4.1 in [10] it fohows that the uniform mixing coefficients 
(see Section 6.2) decay geometricahy and, therefore, there exists a stationary 
density which we shall denote ■0/- 

Throughout the paper we shall assume that the observations (4) satisfy 
the following assumption: 

(Al) The random variable Xq has the stationary density tpf{-), which im- 
plies that the sequence (Xj)j=o,...,n is in the stationary regime. 

Note that the stationary density satisfies ^^ijjf{x)dx > ^i{B), for all 

B€B. 

Before we can state our main results on the approximation of the non- 
parametric autoregressive model by a nonparametric regression model, we 
have to introduce the basic concepts of asymptotic equivalence. Let = 
{Q^ , Af , {PPj , f S S'}), / = 1,2, be two sequences of statistical experiments 
indexed by / in a subset S' C S. The deficiency of iff with respect to £2 is 
defined as 

6{£^,£^) = supinf sup sup \ElfL{f,S'-'^) - ElfL{f,6^^\ 

L SW 5(2) /eE' 

where the first supremum is taken over all decision problems with loss func- 
tion L with < L < 1 , and the minimax value of the maximum difference in 
risks over / G S' is computed over all randomized statistical procedures 5^'^ 
for £p, I = 1,2. According to Theorem 2 on page 15 in [20], the deficiency 
distance can alternatively be written as 

5i£^,£^) = inf sup i||M • P^j - Psjilvar, 

where || • ||var denotes the total variation distance and the infimum is taken 
over all Markov kernels M on fi" x A2. Le Cam'' s pseudodistance between 
and £li is 

^{£^,£^) = ma^{5{£^,£l^),5{£l^,£^)}. 

Following [4], we say that the sequences , n = 1, 2, . . . , and £2, n = 1, 2, . . . , 
are asymptotically equivalent if 

A(<Sf,£:^)^0 asn^oo. 

To formulate our results we also need to impose the following regularity 
assumptions on the density p{-) of the innovations: 

(A2) (i) The density p is positive on R. 
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(ii) The log-likelihood function lp{x) = logp(x) has three deriva- 
tives and satisfies, for some e > 0, 

/ sup l''{x + u)^p{x) dx < oo, sup < Cl < oo. 

\u\<e xgM 

(iii) The score l'p{x) =p'{x)/p{x) satisfies, for some e > and any 
A < oo. 



/ \l'{x + u)\^p{x) dx < oo. 

\u\<e 



A 

Assumption (A2) mainly requires the existence of three derivatives of p(-) 
and of the absolute moments of the corresponding scores. These types of 
assumptions can be related to the so-called Cramer conditions (see [20], 
page 102). Assumption (A2) is used here just for the sake of simplifying the 
proofs, but it is clear that they could be relaxed substantially. We refer 
to [15] for a relevant exposition of sufficient assumptions in the case of 
nonparametric models with independent observations. 

In the sequel q{-) denotes a positive density which satisfies the following 
assumptions: 

(A3) (i) The log-likelihood function lq{x) =\ogq(x) has three deriva- 
tives and satisfies, for some e > 0, 

/ sup l''{x + u)'^q{x) dx < oo, sup |/'"(a;)| < ci < oo. 

JIR|u|<e xm 

(ii) The score lq{x) = q'{x)/q{x) satisfies, for some e > and any 
A < oo, 

sup \l'{x + u)\^q{x) dx < oo. 

\u\<e 

(iii) The Fisher information corresponding to the density q{-) is the 
same as that corresponding to p{-), that is, 

1= f l'(xfp{x)dx= f l'{xfq{x)dx. 

We state local versions of asymptotic equivalence, that is, we additionally 
assume that / lies in a shrinking (as n — > oo) neighborhood of some central 
function /q. To get a meaningful result, we have to choose this neighborhood 
large enough such that it can be reached with a probability tending to 1 by 
an appropriate preliminary estimator. We fix any /3 > 5/2 and define 

logn\^/(2/3+l) ^ /jQg^^ (/3-l)/(2/3+l) 



(5) 7„ = c^ 



In 



n 
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Here 7^ and 7^ are the rates at which the function / and its derivative 
/' can be estimated in the model (4) and in the corresponding regression 
models. For any /o G S, introduce the neighborhood 

^, = {f£^:fix) = fo{x),x^[A,B], 

||/-/o||oo<7n,||/'-/olloc<7;}, 

where A< B are two constants. 

Our main results are the following two theorems which state the local 
asymptotic equivalence of our nonparametric autoregressive model to a non- 
parametric regression with random and nonrandom designs. We start with 
the case of random design. 

Theorem 2.1. Let = (R", 5", {Pf, / E S^J) he the local experiment 
based on observations Xi, i = 0, . . . ,n, obeying (Al) with f S S^^. Suppose 
that the density p{-) satisfies assumption (A2). Let Qj^ = (M", o", {Qy , / G 
Sjy}) be the nonparametric regression model in which we observe 

(6) = f{^i) i = l,...,n, 

where rji, . . . ,r]n are i.i.d. with density q{-) obeying ( A3j, ^1, . . . ,^ri OLfe i.i.d. with 
the common density independent of rji, . . . ,rin, and / G is un- 

known. Then, for all (3 > 5/2, the sequences of experiments £^^, n = 1, 2, . . . , 
and gj^, n = 1, 2, . . . , are asymptotically equivalent uniformly in /o G S." 

sup A{£J^,g]J^O as 00. 

/06S 

Our second local result states asymptotic equivalence to the regression 
model with nonrandom design. 

Theorem 2.2. Let £f^ = {W,B'',{P^,f E E^J) be the local experi- 
ment based on observations Xi, i = 0, . . . ,n, obeying assumption (Al) with 
f G S"^. Assume that the density p{-) satisfies assumption ( A2). Let Q"^^^ = 
(W^ , , {Q'j , f G ^/q}) be the nonparametric regression model in which we 
observe 

(7) Yn,i = f(tn,i) +r]i, i = l,...,n, 

where r]i,...,r]n are i.i.d. with density q{-) obeying assumption (A3). Fur- 
thermore, tn^i, tn,n 0,^^ nonrandom design points chosen according to 

the density ipfo{')j that is, (i — l/2)/n = J^^ipfQ{x) dx, i = l,...,n, and 
f E is unknown. Then, for all [3 > 5/2, the sequences of experiments 
£J'^, n=l,2,..., and Q'j^^, n=l,2,..., are asymptotically equivalent uni- 
formly in fo GT, : 

sup A{£f^,g]J^O asn^oo. 
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Remark 1. As a by-product of our main results, we obtain also asymp- 
totic equivalence of nonparametric regression with random and regular non- 
random design. However, since we used a construction of the likelihood ratios 
based on a Skorokhod embedding rather than a KMT construction, the rate 
for the approximation error between the likelihood ratios of both models 
is presumably not the best possible one. We conjecture that the constraint 
/? > 5/2 that was imposed for proving asymptotic equivalence of nonpara- 
metric autoregression and nonparametric regression can be further relaxed 
for the case of asymptotic equivalence of nonparametric regression with ran- 
dom and regular nonrandom design. It follows from the results in [4] and [2] 
that in the special case of Gaussian errors this equivalence holds even for 
/3>l/2. 

Remark 2 . Our results on asymptotic equivalence in the Le Cam sense 
of nonparametric regression and autoregression can be used to transfer ex- 
isting lower asymptotic efficiency bounds (when the loss is measured in the 
supremum norm) in nonparametric regression to the case of nonparametric 
autoregression. Indeed, it can be seen from the calculations in [9], Section 5, 
that a shrinking neighborhood of size 0((logn/n)^/(2/3+i)) around some cen- 
tral function /o is large enough for generating the desired risk bound. Hence, 
we can actually deduce these lower asymptotic efficiency bounds in the cases 
/3 > 5/2 which are covered by our results. 

Owing to the local character of our results (asymptotic equivalence is 
proved for shrinking neighborhoods of /o), we cannot directly use them 
for transferring upper asymptotic risk bounds. However, such bounds can 
be easily derived by straightforward calculations or by using asymptotic 
equivalence results between nonparametric estimators in both settings as 
given by strong approximations in [22]. 

The possibility of transferring asymptotic efficiency bounds on the basis 
of the asymptotic equivalence of experiments has been already known for a 
long time. This principle was applied by Korostelev and Nussbaum [18] for 
deducing asymptotic minimax bounds in nonparametric density estimation 
from known results in signal estimation in Gaussian white noise. On the basis 
of local equivalence results. Drees [11] transferred available lower asymptotic 
risk bounds from the Gaussian white noise model to the case of estimating 
an extreme value index. 

3. Proofs of the main theorems. In this section we shall prove Theo- 
rem 2.1. Theorem 2.2 can be derived in the same way. 

Our method of estimating the Le Cam distance IS.{£^^,Q'^^^ runs as follows. 
Let Xq, . . . ^Xn be the observations obeying assumption (Al) with / G T/l 
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and let (li, ^i), . . . , {Yn, (,n) be the observations defined in Theorem 2.1. De- 
note by Lj:'^^ and Ljr'^^ the Ukehhood ratio processes of the experiments 
and GJ^, respectively, 

and 

-2,n _A 'liy^-fi^^)) 



\ q{Yi - mi)) ■ 

According to Proposition 2.2 in [23] (see also [20], page 16, for a similar 
assertion in the parametric context), the deficiency distance can be estimated 
as 

(8) ^{^l.Ql)< sup i?p|Z)J^-P-'^' 



where L^j^j^ and L^j^j^ are arbitrary versions of the likelihood ratios -^J-'j^ and 

Ljj^^ constructed on a common probability space (0,.^, P) and distributed 

according to the central measure P/p. The versions -^^j and -^^j will be 
constructed in such a way that the right-hand side of (8) tends to zero as 
n — > oo. Since this will hardly cause any confusion, we drop the tildes in 
the notation of and j^- With this agreement inequality (8) can be 
written as 

(9) ^(^/o'^/o)< -p^/oi4;/o-4;/oi- 

The subscript /o at the expectation indicates that the measure P corre- 
sponds to the central measure P/q . 

First, we give a bound for the Li-distance on the right-hand side of (9) 
in terms of the Hellinger distance: 



(10) ^^/ol4;/o - 4;/ol ^ H{P],Q-,) ^ ^Ej,,{^L)l - ^L)%)\ 

where L^f^j^ and L^f^j^ mean the corresponding versions of the likelihood ra- 
tios. Here H{P, Q) denotes the Hellinger distance between two probability 
measures P and Q. Following an idea originating from [23] in the context of 
density estimation and from [14] in the context of regression with indepen- 
dent observations, we shall use an analogue of the following property of the 
Hellinger distance for product measures (see Lemma 2.17 in [27]): 

(11) H'(^P(^^Q'^^A <Y^H\P(^Q(% 

\i=i 1=1 / 1=1 
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where p(') and Q(') are the measures corresponding to certain disjoint blocks 
of observations and Kn is a sequence satisfying Kn — > oo and Kn/n — > 0. 
The size of these blocks will be chosen small enough so that one can get 
reasonable estimates for H'^{P^^\Q^^'>). It is clear that the estimate (11) is 
essentially based on the product structure of the measures {8)^'\P(') and 
0^'1Q('\ and in general does not directly apply to the case of dependent 
observations. 

In the particular context of the dependent data under consideration, we 
proceed as follows. Set Kn = [n^/^]. Split the set of indices {l,...,n} into 
Kn blocks, 

i:{l-l)—<i<l—\, 1 = 1,. ..,Kn. 

Denote by mi the number of elements in the block Xi, that is, mi = ^Zi = 
0{n^/^). Let ii be the first element in the set Xi. Furthermore, let J-q be the 
trivial c-field and, for \<l < Kn, 

The likelihood ratio corresponding to the observations Xo,...,X„ can be 
written as the product 

^/,/o-n^/,/o> ^/,/o-n,(x.-/o(x._,))' ^ 

where -^j jp"* = V'/(^o)/V'/o(^o) and -^^^'^j is the conditional (given Ti) likeli- 
hood ratio generated by (Xj :i €li). Analogously, in the case of a regression 
experiment with random design, we have that 

^fjo - 11 ^/,/o ' ^/,/o - 11 :jv~u7as^ l<l<Kn, 



where L'^f^j^J = 1. A generalization of (11) to our setting with dependent 
random variables is given by the following result. 

Lemma 3.1. 



/=0 



Proof. The proof of this assertion is adapted from that of Lemma 2.17 
in [27]. We rewrite the Hellinger distance as 
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For the last term one easily deduces 



n 

1=0 



E 



h 



> E 



Jo 



n 



1,(0 r 2,(0 
fJo 



ess inf Ef^J (\l L 



L 1=0 

Continuing in the same way we obtain 
1,(0 r 2,(0 



-l,{Kn)j^2,{Kn) 

^/,/o /,/o 



\^kJ. 



=0 



1=1 



'1,(0 r 2,(0 I 



k„ 



> - esssup hEML]'^;! - \ L'fl}l ) VO) 



1=0 



Using the inequality I — Y{{1 — ai) <^ai, which is true for all < < 1, 
we obtain the assertion of the lemma. □ 



Hence, we have an analogue of (11) for the case of dependent random 
variables. Separability, which is equivalent to independence of the factors in 

ji*^^ achieved by transition to the "worst case" which 

is appropriately expressed by esssup£'jo((y^Lj^ — \Jl?^^)'^\Ti) . 

Note that, since p is positive on M and supjgj^n ||/||oo < M < oo, the 
condition p < 1 of Lemma 6.1 below is satisfied with some p depending on 
p and M. Then Lemma 6.1 and assumption (A2) imply, as n — > oo. 



(12) 



sup sup i?;,((^4f -^Ljf )Vo) 



sup sup %(J?/;/(Xo)/'0/o(Xo) - 1) ^0. 



fo 



Now Theorem 2.1 follows from Lemma 3.1, (12) and from the following as- 
sertion which provides us with bounds for the conditional Hellinger distance. 
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Proposition 3.1. Suppose that assumptions (^AlJ-('A3j are satisfied. 
Then there exists a construction of the sequences Xo,...,X„ and {Yi,^i), 
. . . , {Yn,(,n) on a common probability space such that 



max sup sup esssn^ Ef^{{J L^'^f - J L]'f f\Ti) = o{K^^ 



/o 

The proof of this proposition is postponed to Section 4. 

In the case of comparing nonparametric autoregression and regression 
with regular nonrandom design, we proceed analogously. We use the same 
splitting of the set of indices {1, . . . , n} into blocks Ii, ■ ■ ■ ,Ik„ as above. The 
pairs {Yn^i,tn,i), ■ ■ ■ , {Yn^n,tn,n) are rearranged in such a way that 



for all i G {1, . . . , m;}, / € {1, . . . , Kn}- Then we write the likelihood ratio as 

'3,n _inr^3,{0 ^3,(/) ^ TT <l{Yn,i- f{tn,i)) 

' 11 /i/o ' /Jo ~ 11 

/=0 i&Ii 



/Jo 11 /Jo' /Jo IV q^Yn,i-h{tn,i)y - - 



where -^^j = 1- Let T'q be the trivial cj-field and, for / = !,... .,Kn, 
T'l = (t(Xo, . . . , y„^i, . . . , Yn^ii-i). 

Using the same arguments as in the proof of Proposition 3.1 we obtain the 
following assertion. 



Proposition 3.2. Suppose that assumptions ( A'i) are satisfied. 
Then there exists a construction of the sequences Xq, . . . ,Xn and . . . , 
Yn^n on a common probability space such that, as oo, 



max sup sup ess sup Ef^{{ J L - J L ;^.>) = o(iC„ ^). 
Theorem 2.2 follows from Lemma 3.1, (12) and Proposition 3.2. 



4. Proofs of Propositions 3.1 and 3.2. 



Proof of Proposition 3.1. Let Xo,...,Xn and (Yi,^i), . . . , (y„,^ri) 
be the observations generated according to (4) and (6). According to Theo- 
rem 5.1, there is a construction of the sequences Xq, . . . , X„ and (Yi, ^i), . . . , 
{Yn,Cn) on a common probability space which are coupled in such a way that 
the assertion of Theorem 5.1 holds true. Without loss of generality, we can 
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assume that the sequences Xq, . . . , Xn and (Yi, ,^1), . . . , {Yn,S,n) are already 
constructed on the probabihty space (fi, endowed with the central 

measure Pj^ . 

Recall that mi = j^Ii = 0(n^/^) is the number of indices in the set X; and 
that Kn = [n^/^] is the number of blocks. Set, for brevity, g{x) = f{x) — fo{x). 
Since / E S^^, we have H^Hoo < 7n and Hg^'Hoo ^ 7n- Since sup^ Kp'(3;)| < ci 

[by assumption (A2)(i)] and "fn'^^i — o{Kn ^^^), we obtain by a Taylor series 
expansion that 

(13) 

and, in the same way, 

= E 9{Ci)W + h E 9{^i?W + o{K-'/') 

i&Xl i&Xi 

(14) 

We introduce the set Ai = Ai^i n A; 2, where 

= - r,^'«| < ci(7„)^/^(7;)^/^-;/'iog-a, 

and f n — > sufficiently slowly. An appropriate choice of the sequence Vn is 
described in the course of the proof of Lemma 4.1 below. We bound the 
Hellinger distance between the partial likelihoods L^j^^^ and -^^j as 



(15) ^/,((v4;S-V4S)^l^z)<i?i + i?2, 

say, where 



First we bound Ri. On the set Ai, we get 

I logL};^) - logLj;J)| = 0((7„)^/n7;)^/^m;/^logm,) + 0(^-^2). 

Since P > 5/2, we have (7„)^/^(7^)'^''^m^^''^logmi = o(Kn ^^^), which in turn 
implies that 



L)fjL'i! - 1| = |exp(llogL;;S - ilogL^f) - l\=oiK-'/^) 
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2,(0 I 



Taking into account that Efg{Lj-j'^\J^i) = 1, we get 



(16) 

— TP. fi 

"/Jo 



Now we shall bound i?2- Set Bi = {logLj-'jJ < 1} and Q = {logL^'jj < 1}. 
Then 

< 4ePf,(Ai\^i) + 2Ef,{L)fj^yi) + 2Ef,{L]jll^^\Ti). 
We will prove that 

(18) Pf,{Ai\ri)=o{K-^), Pf,-a.s., 

and that 

(19) 

Then, in conjunction with (15)-(17), we obtain the desired bound 



^/o((V4;/^v^?;/o)Vz)=o(i^-^). 

Hence, it remains to prove (18) and (19). 

First we prove (18). By Theorem 5.1 (with some A large enough) we have 
that 

(20) Pf,(Ai,^\J^i) = 0{m^^)=oiK~'). 

To complete the proof of (18) we shall prove the following bound. 

Lemma 4.1. 

(21) Pf,(Ai,2\:Fi) = o{K-'). 

Proof. We shall use the fact that the Markov chain Xq, . . . ,Xn is 0- 
mixing. Decompose the set X; as T; = Ii^^ U so that I^^^ contains the 

(2) 

first Co log elements of the set X; and the remaining ones, where the 
positive constant cq will be chosen below. Let / € {1, . . . ,Kn} and let 2 be 

the first element of Ii ■ According to Lemma 6.2 in Section 6.2, we can 
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construct a version Xi^ ^-i of the r.v. Xi^ ^-i on the same probabihty space, 
such that Xi^ j-i is independent of J^i and 

(22) < c/.(X,,,_i,^0 < cp^ol°g-S 

for some large enough constant cq and for some p <1. Having constructed 
Xi^i for some i G we define recursively a version Xi of the r.v. Xi 
on the same probability space, such that Xi is independent of J^i and of 

-^i—CQ logm; ) ■ • ■ ) -^ii ) ^i—ca log mi: ■ ■ ■ : ^ii ftnd 

Choosing cq large enough, . . . ,Xi^^-^^i satisfy 

Pf,{Xi ^Xi,yie{ii-i,..., ii+i - i}\Ti) = mip'°'°^"'' 

(23) 
Denote 

ieii ieXi 

where d = g{Xi^if'lp{ei). Since Tg^'*''^ is a sum of cq logm;-dependent r.v.'s, 
using Chebyshev's inequality, we obtain 

= 0{v~^Kn-ft^ilogmi) = o{v-^n"^/^). 
Choosing Vn such that f „ — > and o{v~'^n~^/^) = o{K~^) we get 

(24) Pfoilf^'^'^ - Ef,{f^'^^^\Ti)\ > VnK-"'\Ti) = o{K;,'). 

By similar arguments for sums of independent random variables, we can 
show that 

(25) PfM'"^'^ - Ef,Tl^^'\ > VnK-''^\Ti) = o{K-'). 

Taking into account that Ef,^lp{ei) = Ef^l'^irji) = I [by Assumption (A3)(iii)] 
we obtain 



O(7^1ogm0 + - E EjMXr-i?\:Fi)-- E Ef,g{S,,f 
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Since and have the same density ipf we get E'/y = Ej^^ [g{^i)'^] 

and thus 

(26) Ef,{f^''^''^\:Fi) - Ef,{T^'^'^) = O(7^1ogm0 = o{K-^). 
By (23)-(26) we get 

+ Pf,{X, / X, Vi G {ii - 1, . . . - im) 

which proves (21). □ 

Now we prove (19). We give a proof for the first bound; the second one 
can be proved in the same way. Changing the probabihty measure we obtain 
that 

We shaU prove that 

(27) Pf{logL)jl>l\J'i) = o{K-^). 

Indeed, proceeding as in the proof of (24) and using the fact that £i = 
Xi - /o(Xi_i) = Xi- f{Xi^i) + o(7„) and assumption (A2)(ii), one gets 

(28) P/dT^^'W - Ej{T^^^^^\:Fi)\ > cK-^'^\Ti) = o{K-^). 
Since Ef{Tl'^^\Fi) = o{l), we get from (13) and (28), 

(29) Pj{^ogL)f^ > l\:FO<o{K~') + Pf{T'/'^ > 
If we prove that 

(30) Pf{Tl'^'^>'^\ri) = o{K;;'), 

then we get, in conjunction with (29), that (27) holds. 

To prove (30) we use the exponential Chebyshev's inequality for mar- 
tingales. Since /3 > 5/2, by (5), we have 7„ = o{n~'^/^'^~^^), for some 6 > 
small enough. Recah that mi = 0(n^/^) and ||g||oo ^ In- Assume first that 
n~^\lp{ei)\ < const. Using Lemma 6.3, 

Pf{Tl'^'^ > 

<e-"'ii;/(^exp(^2|^^n'^5(^i-i)/;(e.)^ 
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\ ieii I 
<e-"*nexp(cn2S^S;/;(e,)') 

< e-*^* exp(cn2^72mi^j/;(e,)^), 

where c is a constant. The latter imphes (30). If n~''|/p(ej)| < const is not sat- 
isfied, we use the same arguments with truncated scores li = k — Ef{li\Xi-i), 
li = l'p{ei)l{\lp{ei)\ <n^) instead of the true scores l'p{ei). The term with the 

difference l'p{£i) — h is bounded easily as before, using Chebyshev's inequal- 
ity, the fact that ei = Xi — /o(Xj_i) = Xi — -|- 0(7^) and assump- 
tion (A2)(iii): 

Pf[Y,{g{Xi.,)l'p{e,)-k)>\\T\ 

= OijlmiEfl'p{s,)H{\i;{e,)\ > n')) = 0{K-'). 

Using the same types of arguments for sums of independent random vari- 
ables we obtain 

Ef,{LfP^I^^\J^i) = Pf(Ci\:Fi) = Pf{\ogL)f^ > 1\:Fi)=o{K-'), 
which completes the proof of the first bound in (19). □ 



Proof of Proposition 3.2. This proof is analogous to that of Propo- 
sition 3.1 and requires only a few minor modifications. Analogously to (13) and (14), 
we use the Taylor expansion 

(31) 

Similarly to the calculations in the proof of Proposition 3.1, the closeness 
of T^'^^^ and T^'^^^ follows from Theorem 5.2, while that of T2^'^'^ and T2'^^^ 
follows in complete analogy to the derivation of (21). □ 

5. A functional strong approximation result. In the proof of the main 
results we use the following strong approximation theorem. It can be viewed 
as an analogue of the functional strong approximation result established in 
[16] for sums of independent random variables. 
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Let Xj, i = l,...,n, and {Yi,S^i), i = I, . . . ,n, be defined according to 
(4) and (6), respectively. Let /o G S and / G S"^. We set 

ieXl i&Xi 

Theorem 5.1. Suppose that assumptions (^Alj-(^A3^ are satisfied. Let 
\> 1 be a constant. Then there are versions of the random variables Xq, . . . , X„ 
and (Yi, ^i), . . . , (y„, on a common probability space such that, for 1 < 

l<Kn, 

sup sup esssupP/o(|SJ'^'^ - ^J'^'^l > c(A)r„|J^/) = ©(m^"^), 

where r„ = (jn)^ ^ ^ '^rri^^^ log mi + m^^ and c(A) is a constant depending 
only on A. 

Tlie proof of this functional approximation result is based on a truncated 
Haar series expansion of f — fo and Lemma 5.1 below which provides a 
strong approximation result for partial sums with respect to a system of 
dyadic subintervals of [^,-B]. 

Define, for j > and A; = 0, . . . , 2^ Sj^k = A + k2"^ {B-A), and 

Ij,k — (Sj,fc— 1) ^j,k\ ) k = 1, . . . ,2-^ . 

The Haar basis functions are defined via indicators as 

ho = {B-A)-'/Hi,^„ 

h,,, = {B- Ar^l^2^l\\j^^^^^^_^ - 1,^,^,,,J (j > 0; = 1, . . . , 2^). 

With a choice of the finest scale of the expansion, j* = j*{n), described at 
the end of the proof of Theorem 5.1, we obtain a truncated Haar series 
expansion of y = / — /o as 

j* 2 J 

g{x) = co{g)ho{x) + '^'^ Cj^k{9)hj,k{x) +rj*{x), 

j=Ok=l 

where co{g) = g{t)ho{t) dt, Cj^k{g) = Ja 9{t)hj,k{t) dt, and rj*{x) is the 
residual term. This yields that 



Ic-l.W c.2,(0| 



< |co(5)l 



J2 H{X,^i%{Ei) - J2 ho{ii-i%{m) 
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j* 2i 
j=Ofe=l 



^ %fc(Xi_i)/p(ei) - ^ hj^k{S.i-i%{Vi] 



+ 



<{B-Ar'l'\c{g)\\zlf-Zlf\ 

j* 2J 



j=0 k=l 



1,(0 



_ 72,(0 I , I 7-, 
j+l,2k-l ^j+l,2fc-ll "I" l^j+l,2fc ^j+l,2fcl 



.1,(0 



z 



2,(0 



+ 



5]r,.(X,_i)^;(e,)-r,*te-i)/;(r?,: 



where 



While the approximation-theoretic calculations are rather straightforward, 
the strong approximation result will require a lengthy proof based on Sko- 
rokhod embedding techniques. Let T„ = {(jj k) : < j < j* , k = 1, . . . , 2-^}. 

Lemma 5.1. Suppose that assumptions (^Alj-f'ASj are satisfied. Then 
there exists a construction of the random variables Xq, . . . ,Xn and 
. . . , {Yn,(,n) on a common probability space such that, for 1 <l < Kn, 

inf esssupPj„(|4f - zff | < Cx{mi2-^flHogmu'i{j,k)eln\Fi) 
= l-0(m-^). 
To formulate the next theorem, we define 

ieii 

Theorem 5.2. Suppose that asumptions (^Alj-(^A3j are satisfied. Let 
X> 1 be a constant. Then there are versions of the random variables Xq, . . . , Xn 
and Yi, . . . ,Yn on a common probability space such that, for 1 <l < Kn, 

sup sup esssupPjo d^M'^ — S^'^'^l > c(A)r„|J^/) = 0{m^'^), 

where rn = {'~fny^'^iln)^^^'m']^^^ogmi + m^^ and c(A) is a constant depending 
only on A. 
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The assertion of this theorem is a consequence of the following lemma. 
Set 

Lemma 5.2. Suppose that assumptions (^Alj-f'ASj are satisfied. Then 
there exists a construction of the random variables Xq, . . . , Xn and Yn^i, ■ ■ ■ , 
Yn^n on a common probability space such that, for I <l < K^, 

inf esssupP/„(l4f - | < Cx{mi2-if"\ogmi,y{j,k)(^ln\r'i) 
= l-0(mr"). 

The proofs of Lemmas 5.1 and 5.2 make use of a multiscale version of the 
Skorokhod embedding and are similar to the construction in [22]. We post- 
pone these proofs to Section 5.2. Now we shall give proofs of Theorems 5.1 
and 5.2. 

5.1. Proofs of Theorems 5.1 and 5.2. As already indicated, the proofs 
of the theorems split into an approximation-theoretic and a stochastic part. 
The following lemma contains the approximation-theoretic facts needed for 
the proofs of Theorems 5.1 and 5.2. 

Lemma 5.3. Let co{g) and Cj ^{g) be the Haar coefficients of a function 
g defined above. Then: 

(i) \coig)\<iB-A)y^g\\^, 

(ii) \cj,k{9)\ < min{(i? - Ay/^2-^/^g\\^{B - Af/^2-''^/^-^g'\\^}, 

(iii) \\g - icoig)ho + EjloELi Cj,k{9)hj,k)\\oo <{B- A)2-^*-^g'\\oo. 

Proof. Assertion (i) follows from 

|co(<7)l <ll5llooy \ho{t)\dt<{B-Ay/^\\g\\^. 
Analogously, we obtain that 

\cj,k{9)\ < Moo I \hj,kit)\ dt<{B- A)i/22-J'/2||^||^_ 
Furthermore, it follows that 

\cj,k{9)\<iB-Ayy'2^/y%it)ili^^^^^^_^{t)- li^^,,,,{t))dt 

<{B- A)-i/22i/2 I \g(t) _ + (5 _ A)2-^-^)\ dt 

<(i?_A)3/22-3i/2-2||^'||^^ 
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which yields (ii). 

Finally, we obtain from (ii) that 



/ j* 2i 

g{x) - coig)ho{x) ^j,ki9)hj,k{^ 

\ j=Ok=l 
oo 2J 

j=j*+ik=i 



< ^ (i3-^)3/22-3i/2-2||^'||^(5_^)-l/22i/2 

i=i*+i 
= {B-A)2~^*-^\\n'\ 



Now we are in a position to prove Theorems 5.1 and 5.2. 



□ 



Proof of Theorem 5.1. Define 



iei. 



j* 2J 

co{g)ho{Xi_i) + Cj,k{9)hj,k{Xi-i) 

j=Ok=l 



^2,(0 _ 'r- 



j* 2J 

co(5)^ote) + XI XI ^jM9)hj,k{^i) 

j=Ok=l 



I'M 



and 



pj,(0 _ <~,i,(/) 



i = l,2. 



Define the event 



Di := {izjf - Z|f I < Cx{mi2-^y/^logmi,y{j,k)eIn}, 

where Cx is a constant. By Lemma 5.1, Pf^^{Di\Ti) = 0{mj''^) with some 
choice of C\. By (i) and (ii) of Lemma 5.3, on the set Di it holds that 

|c.l>(0 c.2,(0| 



2J 



j=0 k=l 



1,(0 



'j+l,2fc-l ^j+l,2fc-l 



2,(0 



+ 1^. 



1,(0 



j+l,2A: ^j+l,2k 



z 



2,(0 
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< Cx\\g\\ooml^'^logmi 

+ C,Y: 2''^' min{2-^-/2||g||^, _ A)2-3.V2-2||^/||^| 

j=0 

X {17112'^)^^^ log mi 

= O((||<7||oo + ||9llJ/'lb'll^'W/'logm0. 
The latter proves that, with some constant c(A) depending on A, 

(32) Pfoi\S}!ji^ - S''^\ > ciXyjJ^i) = 0{mr'), 

where = (7.„)^/^(7^)'^/^mj^^'^log?7ii. By (iii) of Lemma 5.3 it holds that 

p,xi4;5-*i>-r')<-P%,K;5-i¥ 

(33) 

< mf\B - A)2-^'-'\\9'\\^ Yl EMe,)f. 

Choosing the finest level j*{n) =c*logm/, with some c* large enough, we 
obtain that 

(34) Pf,{\R)f\>ml^) = 0{ml^). 

Since the above bounds are uniform in /o G S, from (32)-(34) and a similar 

2 (l) 

bound for Rr J we conclude the assertion. □ 

J ij 

Theorem 5.2 can be proved in a similar way. 

5.2. Proofs of Lemmas 5.1 and 5.2. We prove Lemma 5.1 only for 1 = 1, 
since the proof for / > 1 is completely analogous. The proof of Lemma 5.2 
then requires only some obvious modifications and therefore will not be de- 
scribed here. To simplify notation we drop the index I in the following, that 
is, we write Z^^, ^jk^ instead of Zj'l: \ Zj'j!'\ nii, respectively. 



Proof of Lemma 5.1. Conditional on Xq (which represents the infor- 
mation contained in .Fq), we construct a pairing of Xi, . . . , and (Yi, ^i), . . . 
{Ym,Cm) such that 

inf esssupP;„(|Zi,. - < Cx{m2-^)^/^ logm, {j, k) G T„,|Xo) 
= l-0(m"^) 

is satisfied. In the following, all estimates are to be understood to hold 
uniformly in /q G S. 
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The pairing of the random variables of both models is organized by a 
simultaneous Skorokhod embedding of and in a common set of 
Wiener processes Wj^k assigned to the intervals Ij^k- We describe this em- 
bedding in detail for the autoregressive process. The embedding of lq{r]i) 
from the regression model is completely analogous and will be briefly men- 
tioned only. Then we draw conclusions for the rate of approximation of Zj ^ 
by -Zj^, which will conclude the proof. An embedding scheme like this has 
already been developed in [22], in a different context. In view of some mod- 
ifications and since we intend to provide a self-contained paper, we give a 
full proof of this lemma. 

Let Wj^k-, eXni be independent Wiener processes. Apart from the 

coarsest resolution scale which corresponds to j = 0, we use each of these 
processes only on a finite time interval [0,T!,-^fc], where the particular (non- 
random) values of the Tj ^ will be specified in part (iv) below. For the time 
being it is only important to know that Tq ^ = oo. 

(i) Embedding of lp{ei) and construction ofXi. 

First we define Zp(ei) by a Skorokhod embedding in the Wiener processes 
mentioned above. Since lp{ei) does not necessarily define Xi uniquely, we 
have to use perhaps an additional randomization to get Xi . 

Let ki be that random number with Xq £ Ij*,ki- Now we are going to 
represent lp{ei) by increments of the Wiener processes, preferably by those 
of Wj*^ki - However, since we want to use Wj*^k-^ up to some prespecified time 
Tj*^ki only, it might happen that this is not enough for representing /p(ei). In 
this case we additionally use a certain stretch of the process Wj*_i^[ki/2] > 
so on. The Wiener processes which are potentially used for the representation 
of lp{£i) correspond to a containment relation of the dyadic intervals, 



where [a] denotes the largest integer not greater than a. This means that 
we represent lp{£i) by the following Wiener process: 



Ij*,k ^ Ij*~i,[k/2] ^ • • • ^ 



0,[fc2-J*]! 



■Wj*,kAs), if 0<s<Tj*^k„ 

^ ^i+l,[fci2J+i-J*](^i+l,[fei2i+i-i*]) 




^\ l=j+l / 
j*,ki H ^ ^j+l,[fci2i+i-J*] 



is indeed a Wiener process on [0,oo), since Tq^^ = oo.) 
According to Lemma A. 2 of [17], there exists a stopping time r^^) such 
that the distribution of W^^\t^^^) is equal to the conditional distribution of 
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lp{ei) given Xq. We define ei in such a way that 

Zp(e,) = t^a)(r«). 

[This is achieved by first setting lp{£i) equal to W^^\t^^^) and then defining 
ei with the aid of an additional randomization according to its conditional 
distribution given lp{ei).] Finally, according to the model equation under /o, 
we set Xi = fo{Xo) + ei. 

To explain the following steps in a formally correct way, we introduce 
stopping times rj^, i = 1, . . . ,m, assigned to the corresponding Wiener pro- 
cess Wj^k- Define 

To get rj^^j we redefine all those Tj^^ which are assigned to Wiener pro- 
cesses Wj^k that were used for representing /p(ei). According to the above 
construction we set 

We redefine further 

Tj-^ki ^j+l,[fci2J+i-J*]] ^ ^j,[fci2J-J*]' 

if Tj*^ki H ^7'j+i,[fci2J+i-i*] < •^^^^ 

otherwise. 

The remaining stopping times rj^^ with / ^ \ki2^~^*\ keep their preceding 
values T^^i = 0. 

This procedure will be successively repeated for all other e^'s with the 
modification that we use only those parts of the Wiener processes which are 
still untouched by the previous construction steps. 

(ii) Embedding of lp{ei) and construction of Xi. 

Assume that Xq, . . . ,Xi-i are already defined. Let ki be that random 
number with € Ij*,ki- Now we represent lp{£i) by parts of Wj*^kij 

[fc./2]; ■ • ■ , W^o,[2-i*]5 which have not been used so far. 
First note that, because of the strong Markov property, these remaining 

increments ^j,[fc,2J-j*](« + rJjfc.2.-.*]) " ^j,[fc,2^-^*] (^j-'[fc^2i-J*]) indepen- 
dent Wiener processes, also independent of Xq, . . . , Xi-i. Hence, gluing these 
parts together we obtain a Wiener process on [0, oo) which is independent 



j,[fci2J-J 
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of Xq, . . . , Xi-i. This process is given as 



ifo<s<r,*,,,-rj:-;\ 

+ {W^j+l,[/ci2J+i-3*](^j+l,[/ti2J+i-:'*]) 

- ^i+l,[fci2J+i-3*](^j+l,[fc,2J+i~J*])} 



\ i=j+i 



+ T 



(i-i) 



j,[k,2J-J*] 



j,lka^-^*]i'^j,[k,2J-J*]' f ' 



(i-l) 



=j+l 



3* 



There exists a stopping time r^*) such that Vl^(')(r('^) has the same distri- 
bution as the conditional distribution of lp{ei). We define £i in such a way 
that 

and set = /o(Xi_i) + ej. [The definition of ei is again achieved in two steps 
by first setting lp{£i) equal to VF^*^(r^*^) and then defining £i according to 
its conditional distribution.] 

To complete this construction, it remains to define the stopping times 
Tj j^. These stopping times indicate up to which point the Wiener processes 
have been used in the first i steps. Accordingly we set 



\-'j,[ki2o-r 



+ (^i+l,[fc,2J+i-i*] 



I j,[ki2j-j*y 



j+i,[ka^+^-j']' 



otherwise. 
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For all {j,l) with / / [kil^"^'] we define 



After embedding lp{£i), . . . ,lp{em) we arrive at stopping times . The 
partial sums are connected to the Wiener processes by the relation 

{u,v) : Iu,vQIj.k 

(35) 

i: l<i<m,Xi^ieIj^k {u,v): lu.vDij.k 

(iii) Embedding of lq{rii), . . . ,lq{'i]m) o-nd construction of (Yi,^i),..., 

This will be done in complete analogy to the construction described above. 
We define again stopping times t- ^ and obtain the following representation 
of the partial sums: 

(u,v) : Iu,vQIj,k 

(36) 

+ E E w^Ari%)-WuAri:;;'^)- 

i: l<i<m,Yi&lj,k {u,v): In,v^Ij,k 

(iv) Choice of the values for Tj^k- 

To motivate our particular choice of the Tj^k described below, we consider 
first two extreme cases. If Tj*^k = oo for all k, then Zjt, ^ and Z?* ^ are both 
completely represented by Wj*^k- This leads indeed to a satisfactorily close 
connection of Zjt ^ and Z?* ^. On the other hand, this choice is unfavorable 
at scales j ^ j*. Although we get immediately the upper estimate 

the difference between Zjf^ and Z?^ will be unnecessarily large. This is 
because, for j ^ j* , ^ and are then represented by too many different 
stretches of the Wiener processes Wj*^i with C Ij ^- 

On the other hand, if the Tj* ^ are rather small, then Zj, ^ and Z?, ^ will 
be represented in large parts by stretches of Wiener processes Wu,v which 
correspond to intervals Iu,v ^ Ij*,k with j < j*. Once we are on a coarser scale 
j < j*, we cannot guarantee that Zj, ^ and Z?, ^ are (mostly) generated by 
identical parts of the Wiener processes. Consequently, we would also get a 
suboptimal connection, this time for Z|, ^ and Z?* ^. 
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To find a good compromise between tiiese two conflicting aims, we clioose 
the Tj^k as large as possible, however, with the additional property that, for 
j 7^ 0, the stretches [0,rj^fc] are used up with a high probability in the repre- 
sentation of both lp{ei), . . . , lp{em) and • • • , Iqirjm)- Strictly speaking, 
we choose the Tj^k in such a way that 



ess 



(37) 



and 



(38) 



supPy„ ( ^ t(*)/(X,„i G fc) < 

(m,d) : Iu,vQIj,k 



, 1=1 



V(j,A;)G In\mk)} 



■ 0{m~ 



Pfo I E ^^"^^i^i e < E T^n,., V (i, fc) G Tn \ {(0, fc)} 



0(m 



To this end, we study first the stochastic behavior of the above sums of 
stopping times assigned to the interval Ij^k- 

Recall that the innovations Si are assumed to be independent. Accord- 
ing to the construction of the Skorokhod embedding described in [17], Ap- 
pendix A.l, the randomness of t(^) is driven by some Ui ~ Uniform[0, 1] from 
a sequence of independent random variables and by {W^'^\s), < s < t^^^}. 
The vectors (Xj_.i,?7j) are of course also (/)-mixing as the Xi. Since, for 
i ^ i', {W«(s), < s < t(*)} and {W^''\s), < s < t^'">} are composed of 
disjoint stretches of the Wiener processes Wj^k separated by stopping times, 
the random variables r'^*^/(Xj„i G Ij^k) inherit the (?l)-mixing property from 
the process {Xj}. Hence, we obtain by a Bernstein-type inequality for sums 
of (/)- mixing random variables (see, e.g., [10]) that 



ess sup Pfg ( 
(39) 



5]{t«I(X,_i G Ij,k) - E[T^'hiXi.i G I,-,fc)]} 

i=l 

> CxVrrO^ \ogm X^ = 0{m-^) 



and, analogously, 



P. 



fo 



(40) 



^{?«/(y, G fc) - E[f<^^I{Y, e hk)]} 



1=1 



> CxVrn2r^ log 



m 



0{m-^). 
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Define 

m 

Sj^k = J2ET^'h{X,_i e I,- fc) - CxV^^ logm. 

1=1 

Furthermore, we define 

(u,v) : Iu,vClj^k 

Then -Sj^fc = E(«,«) : /„,,c/,_fc % (39) and (40) we obtain (37) and (38). 
(v) Conclusion for Zj ^ — Z'j^. 
By (35)-(38) we obtain that 

^},k ~ Wu,v{Tu^v) 
{u,v) : Iu,vQIj,k 

(41) 

i: l<i<m,Xi_ie/j,fc (u,v) : Iu,vDlj,k 

and 

^j,k~ E Wu,v{Tu,v) 

{u,v) : Iu,vQIj,k 

(42) 

+ E E WuAr'il)-Wu,M-'^) 

i : l<i<m,YieIj^k {u,v) : /u,„D/j,fc 

are satisfied with a probabihty exceeding 1 — 0(m~'^). At this point we 
see why our particular pairing of the random variables provides a close 
connection between Zjj^ and most of the randomness of Zjf^ and Zjj^ 
is contained in the first terms on the right-hand sides of (41) and (42), 
respectively. These terms are random, but identical to each other. 

To analyze the difference between the right-hand sides of (41) and (42), 
we compose the pieces {Wu,v{s),'Tu,v < s < and {Wu,v{s),t^u,v < s < 
T^}}} corresponding to intervals Iu,v ^ Ij,k to Wiener processes. For fixed i, 
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we define 



II U S S S rj_ljfc/2] ^j-l,[fc/2]' 



w, 



i+l,[fc2'+i-J](T/+i^^[i2i+i-J])] 



+ [Wl^[k2'-J]iu) - Wi^[k2''4\lk2'~^]'^^ 

if s - fr^*^ - r^*"-^^ )A 

II S - l7-j_l,[fc/2] ^j-l,[fc/2]^ + 



With r/],2V-.] < ^ < r^%2^-,] 



i+l,[fc2'+i-J] i+l,[fc2'+i-J] 



L[k2'-i]' 



It is clear tliat Wp^'^ is a Wiener process on tlie interval [0,rj^^''], where 

res,i / (i) 1)^ 

'Tj,k — 2^ /j,fcC/„,t,v''"«,f ~ J- 

By the strong Markov property, the remaining parts of the Wiener pro- 
cesses Wj^k again form independent Wiener processes, also independent of 
< s < tJ')!'*}. Therefore, we can compose all these latter parts to 
one Wiener process by setting 

res,l 



ifO<.<r^., , 



j,k V j,k 



I Tjrres,ii/ res,l 



'j,k y'j,k } 

1 rcs.n— 1\ 

i,k — ■ ■ ■ — Tjjj j, 
■ r res,l . , res,ii— 1 , ^ res,l . . res,u 



An analogous construction can be made for the Tu'v^ leading to Wiener 
processes WJf . 

If both G > and G lj,k) > Sj,k are 

satisfied, then 



and 



i: l<i<m,Yi&Ij,k {u,v): Ij^kCln,v 
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Hence, we obtain by (37), (38) and Lemma 1.2.1 in [6], page 29, that, for all 
{j,k) E In, 

esssupP/o(|Z] fc - > <|Xo) 



ess sup Pfy 



></2Xo 



+ 0(m-^) =0(m~^), 
where = CA(m2~'')-'^/^ log?Ti. This completes the proof. □ 

6. Some auxiliary results. 

6.1. Convergence of stationary distributions. 

Lemma 6.1. Suppose that (x/)j>o and {x(°)i>Q are stationary pro- 
cesses obeying (4) with autoregression functions f and fo, respectively, where 
I/I, |/o| < M . Assume that the innovations (ei)i>i are i.i.d. with a density p 
such that 

roo 



/? = I sup 

-M<xi<X2<M J-oo 



\p{x — Xi) — p{x — X2)\ dx < 1. 
Then, for the stationary densities ipf and ipf^, it holds that 
i^Jipfix) - ^fpjjx)fdx 



< 



sup 



1 -P«e[o,||/-/o 



\p{x) — p{x — u) \ dx. 



Proof. We denote by p-^{x\y) =p{x — f{y)) and p^'^{x\y) =p{x — fo{y)) 
the transition densities of the processes {x()i>Q and (x/°)j>o, respectively. 
It holds that 



{J-iljfix)- Jiljf^{x)f dx< / \il)f{x) -ipj^{x)\dx. 

3 J —OO 

Let, for brevity, ^ fj^{x) ='ipf{x) — ipf^^x). Prom 

*/,/o(^)= / [p-''ix\y)-p-''''ix\yMfiy)dy+ / p^''{x\y)^ fj^{y)dy 
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we deduce that 



\^fj,{x)\dx 



< 



{x\y) - {x\y)\tpf{y) dy 



dx 



+ 



/ pf'>{x\y)[^ fj^,{y)]^dy 



dx 



< 



'4jf{y)dy 



{x\y) — p^" {x\y)\ ^x 

+ sup / \p^"{x\yi)-p^°{x\y2)\dx I [^fj^{y)]^dy 
yi,y2J J 

< sup / \p-^ {x\y) — p^'^ {x\y)\ dx 
y J 

+ sup / \p^''{x\yi) -p^''{x\y2)\dx fjg{y)]^dy. 
yi,y2J J 

The latter implies 



|^'/jo(x)| dx < sup j \p{x) — p{x — u)\ dx 

0<«<||/-/o||, 



+ isup / \p^^''{x\yi) - p^°{x\y2)\dx / \^ f j^^{x)\ dy . 
yi,y2J J 

Rearranging the terms we obtain the assertion. □ 
6.2. An analogue of Berbee^s lemma. 

Definition 6.1. The uniform (/^-mixing coefficient between r.v.'s ^ and 
r] is defined to be the number 

0(^,7?) =sup{|P(^) - P{A\B)\:Ae a{0,Be a{r,),P{B) + 0}. 

Lemma 6.2. Suppose that ^ and r] are two random variables with val- 
ues in M} and W^, respectively, given on the probability space {Q, T,P). 
Furthermore, we assume that ^ and rj possess a joint density and that the 
probability space is rich enough for the definition of a random variable 
A~ Uniform[0,l] which is independent of and ij. Then we can construct 
a random variable ^ = ^(^,ry, A) such that: 

(i) >C(^|r/) =£(^) a.s., that is, ^ is independent of rj and has the same 
distribution as ^, 

(ii) P(^/eh)<0(?,??) a.s. 
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Proof. The idea of the proof is of course closely related to that of the 
proof of Theorem 2 in [1]. However, since the formulation of our result differs 
slightly from theirs (they constructed ^ in such a way that it is close to ^ 
with a high probability, whereas we are interested in an exact coincidence 
of ^ and ^) we decided not to omit this proof. 

We denote by p^{-) the marginal density of ^ and by p^\rii-\y) the condi- 
tional density of given rj = y. Define 

'Py = hj \pd^) -P^\vi^\y)\dx = l- J p^{x) Ap^i^{x\y)dx. 
Then cp^ < (pi^.^v) 

If (prj = 0, then pg(-) and | y) coincide and we set = ^. Otherwise 

we proceed as follows. With a random variable A ~ Uniform[0, 1] which is 
independent of ^ and 77, we set 

f=f«,,,A) = |«- «fe«)>AP.„K|.-). 

L ^, otherwise, 

where S, is an appropriate random variable having the density [p^{-) —p^{-) A 
P^\ri{'\v)]/4'r]- The random variable ^ is defined via a quantile transform as 

1 fy 

Griiy) = — / [Pd^) - P^\rj{x\r])]+ dx. 
(Prj J —00 

Now we have 

P(e = elr?) = P{p^ir,{C\v) Ap5(e) > Ap5|^(e|7?)|7?) 

Ap^{x)dx = l-(j)^, 
which implies (ii). Part (i) follows from the construction. □ 

6.3. An exponential inequality. We made use of the following inequality 
whose proof can be found in [15]. 

Lemma 6.3. Let be a r.v. such that = and < a, for some 
positive constant a. Then 

Eexp{X^) < exp{cX^Ef), \X\ < 1, 

where c = e"/2. 
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